Automatic indexing of scanned documents: a layout-based approach
نویسندگان
چکیده
Archiving official written documents such as invoices, reminders and account statements in business and private area gets more and more important. Creating appropriate index entries for document archives like sender’s name, creation date or document number is a tedious manual work. We present a novel approach to handle automatic indexing of documents based on generic positional extraction of
منابع مشابه
Automatic Analysis and indexing of variable-layout documents
In this paper a methodology for analysis and automatic indexing of imaged documents within an archiving and retrieval system is described. This system, which is being developed within the Esprit project STRETCH (STorage and RETrieval by Content of imaged documents), is based on a new generation Archiving and Retrieval Engine (ARE), which overcomes the bottleneck of document profiling by allevia...
متن کاملRetrieving Images of Scanned Text Documents
Information retrieval is the task of nding documents, usually text, which are relevant to a user's information need. A conventional approach to information management of paper documents is normally based on classifying them into a hierarchical classiication structure. More recently we have seen electronic document management systems which manage scanned images of documents in the same way as pa...
متن کاملAutomatic Indexing for Storage and Retrieval of Line Drawings
The usefulness of a collection of scanned graphical documents can be measured by the facilities available for their retrieval. We present an approach for indexing a collection of line drawings automatically. The indexing is based on the textual and graphical content of the drawings. This approach has been developed to facilitatèretrieval by example' in heterogeneous collections of graphical doc...
متن کاملA Tool for Arabic Documents Indexing and Retrieval From a Web Virtual Library
This paper presents a method for automatic indexing and retrieval of Arabic documents from a virtual library. This latter can be multilingual and encapsulates several documents written in different languages. All the documents are scanned in order to be stored in the library. The indexing method consists in using the document contents as indexes. They are firstly scanned and then submitted to a...
متن کاملLocal Thresholding Algorithm Based on Variable Window Size Statistics
In an automatic document conversion system, which builds digital documents from scanned articles, there is a need to perform various adjustments before the scanned image is fed to the layout analysis system. This is because the layout detection system is sensitive to errors when the page elements are not properly identified, represented, denoised, etc. Such an adjustment is the detection of for...
متن کامل